Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Distributed deduplication storage system based on Hadoop platform
LIU Qing, FU Yinjin, NI Guiqiang, MEI Jianmin
Journal of Computer Applications    2016, 36 (2): 330-335.   DOI: 10.11772/j.issn.1001-9081.2016.02.0330
Abstract865)      PDF (985KB)(1310)       Save
Focusing on the issues that there is a lot of data redundancy in data center, especially the backup data has caused a tremendous waste on storage space, a deduplication prototype based on Hadoop platform was proposed. Deduplication technology which detects and eliminates redundant data in a particular data set can greatly reduce the data storage capacity and optimize the utilization of storage space. Using the two big data management tools——Hadoop Distributed File System (HDFS) and non-relational database HBase, a scalable and distributed deduplication storage system was designed and implemented. In this system, the MapReduce parallel programming framework was responsible for parallel deduplication, and HDFS was responsible for data storage after deduplication. The index table was stored in HBase for efficient chunk fingerprint indexing. The system was also tested with virtual machine image file sets. The results demonstrate that the Hadoop based distributed deduplication system can ensure high throughput and excellent scalability as well as guaranting high deduplication rate.
Reference | Related Articles | Metrics